Run-time Distributions in Passively Replicated Systems Using Timeout and Acceptance Fault Detection

نویسندگان

  • Åsmund Tjora
  • Amund Skavhaug
چکیده

Fault tolerance based on passive replication is common in many systems. If this kind of fault tolerance mechanism is to be used in a real-time system, timing analysis is necessary, as the fault tolerance mechanism itself may cause timing faults. There are different ways of detecting when the primary replica has failed, one of them is to use a timeout to detect crash and omission failures, another to run acceptance tests on the results, which detects some value failures. As these two detection strategies cover different kinds of failures, it can be useful to combine them. In this paper, a mathematical model for the timing behaviour of a system with passive replication, where a combination of timeout and acceptance test is used for fault detection, is derived. Examples are given to demonstrate how the model can be used for calculation of deadline miss probabilities, and further how this can be used for checkpoint placement optimisation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Run-Time Distributions in Passively Replicated Fault-Tolerant Systems

Many real-time applications will have strict reliability requirements in addition to the timing requirements. To fulfill these reliability requirements, it may be necessary to use a fault-tolerance strategy. An active replication strategy, where several instances of the task is run in parallel, is the preferred choice for many real-time systems, as the parallel execution of the task instances g...

متن کامل

Submission for the 4 th CaberNet Plenary

Fault tolerance can be achieved in distributed systems by replication. However, Fischer, Lynch and Paterson have proven an impossibility result about consensus in the asynchronous system model, and similar impossibility results exist for atomic broadcast and group membership. We investigate, with the aid of an experiment conducted in a LAN, whether these impossibility results set limits to the ...

متن کامل

Fault Tolerant Framework in MPI-based Distributed DEVS Simulation

Distributed DEVS simulation plays an important role in solving complex problems for its reuseability, and composability of component models. Using MPI to be the communication middleware, the distribution increases the performance. But even the tiny faults of computing resources can lead to crash. Hence Fault Tolerant is necessary to maintain the simulation reliability. This paper introduces a D...

متن کامل

Online Fault Detection and Isolation Method Based on Belief Rule Base for Industrial Gas Turbines

Real time and accurate fault detection has attracted an increasing attention with a growing demand for higher operational efficiency and safety of industrial gas turbines as complex engineering systems. Current methods based on condition monitoring data have drawbacks in using both expert knowledge and quantitative information for detecting faults. On account of this reason, this paper proposes...

متن کامل

A new scheduling approach supporting different fault-tolerant techniques for real-time multiprocessor systems

Many time-critical applications require predictable performance and tasks in these applications have deadlines to be met even in the presence of faults. Three diierent approaches have evolved for fault-tolerant scheduling of real-time tasks in multiprocessor systems-Triple Modular Redundancy (TMR), Primary Backup (PB), and Imprecise Computation (IC). In TMR approach, the fault detection is by v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007